Run the Setup.R file.
If everything works correctly, you should see a plot:
The qplot() function is the basic workhorse of ggplot2
The qplot() function has a basic syntax:
qplot(variables, plot type, dataset, options)
Objective: Explore the diamonds data set (preloaded along with ggplot2) using qplot for basic plotting.
The data set was scraped from a diamond exchange company data base. It contains the prices and attributes of over 50,000 diamonds.
What does the data look like?
Look at the top few rows of the diamond data frame to find out!
head(diamonds)
## carat cut color clarity depth table price x y z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48Basic scatter plot of diamond price vs. carat weight
qplot(carat, price, geom = "point", data = diamonds)Scatter plot of diamond price vs carat weight showing versitility of options in qplot
qplot(carat, log(price), geom = "point", data = diamonds,
alpha = I(0.2), color = color,
main = "Log price by carat weight, grouped by color") +
xlab("Carat Weight") + ylab("Log Price")All of the “Your Turns” for this section will use the tips data set:
tips <- read.csv("https://bit.ly/2gGoiLR")qplot(data = tips, x = total_bill, y = tip)qplot(data = tips, x = total_bill, y = tip,
color = smoker)qplot(data = tips, x = total_bill, y = tip,
color = smoker,
xlab = "Total Bill ($)",
ylab = "Tip ($)",
main = "Tip left by patrons' total bill and smoking status")To make a map, load up the states data and take a look:
states <- map_data("state")
head(states)
## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>What data is needed in order to plot a basic map?
The states data has all necessary information
A bunch of latitude longitude points…
qplot(long, lat, geom = "point", data = states)… that are connected with lines in a very specific order.
qplot(long, lat, geom = "path", data = states, group = group) +
coord_map()qplot(long, lat, geom = "polygon", data = states, group = group) +
coord_map()geom = "polygon" to treat states as solid shapesIf a categorical variable is assigned as the fill color then qplot will assign different hues for each category.
Load in a state regions dataset:
statereg <- read.csv("https://bit.ly/2i0AFHK")
head(statereg)
## State StateGroups
## 1 california West
## 2 nevada West
## 3 oregon West
## 4 washington West
## 5 idaho West
## 6 montana Westjoin or merge the original states data with new info
The left_join function is used for merging**:
library(dplyr)
states.class.map <- left_join(states, statereg, by = c("region" = "State"))
head(states.class.map)
## long lat group order region subregion StateGroups
## 1 -87.46201 30.38968 1 1 alabama <NA> South
## 2 -87.48493 30.37249 1 2 alabama <NA> South
## 3 -87.52503 30.37249 1 3 alabama <NA> South
## 4 -87.53076 30.33239 1 4 alabama <NA> South
## 5 -87.57087 30.32665 1 5 alabama <NA> South
## 6 -87.58806 30.32665 1 6 alabama <NA> Southqplot(long, lat, geom = "polygon", data = states.class.map,
group = group, fill = StateGroups, color = I("black")) +
coord_map() states.stats <- read.csv("https://bit.ly/2gT95Hc")
## state.name avg.wt avg.qlrest2 avg.ht avg.bmi avg.drnk
## 1 alabama 180.7247 9.051282 168.0310 29.00222 2.333333
## 2 alaska 189.2756 8.380952 172.0992 28.90572 2.323529
## 3 arizona 169.6867 5.770492 168.2616 27.04900 2.406897
## 4 arkansas 177.3663 8.226619 168.7958 28.02310 2.312500
## 5 california 170.0464 6.847751 168.1314 27.23330 2.170000
## 6 colorado 167.1702 8.134715 169.6110 26.16552 1.970501states.map <- left_join(states, states.stats, by = c("region" = "state.name"))
head(states.map)
## long lat group order region subregion avg.wt avg.qlrest2
## 1 -87.46201 30.38968 1 1 alabama <NA> 180.7247 9.051282
## 2 -87.48493 30.37249 1 2 alabama <NA> 180.7247 9.051282
## 3 -87.52503 30.37249 1 3 alabama <NA> 180.7247 9.051282
## 4 -87.53076 30.33239 1 4 alabama <NA> 180.7247 9.051282
## 5 -87.57087 30.32665 1 5 alabama <NA> 180.7247 9.051282
## 6 -87.58806 30.32665 1 6 alabama <NA> 180.7247 9.051282
## avg.ht avg.bmi avg.drnk
## 1 168.031 29.00222 2.333333
## 2 168.031 29.00222 2.333333
## 3 168.031 29.00222 2.333333
## 4 168.031 29.00222 2.333333
## 5 168.031 29.00222 2.333333
## 6 168.031 29.00222 2.333333Average # of days in the last 30 days of insufficient sleep
qplot(long, lat, geom = "polygon", data = states.map,
group = group, fill = avg.qlrest2) + coord_map()states.sex.stats <- read.csv("https://srvanderplas.github.io/NPPD-Analytics-Workshop/02.Graphics/data/states.sex.stats.csv")
states.sex.stats <- read.csv("https://bit.ly/2hiKFIb")
head(states.sex.stats)
## state.name SEX avg.wt avg.qlrest2 avg.ht avg.bmi avg.drnk sex
## 1 alabama 1 198.8936 8.648936 177.5729 28.50714 3.033333 Male
## 2 alabama 2 173.0315 9.224771 163.9956 29.21280 2.041667 Female
## 3 alaska 1 203.3919 7.236111 178.3896 28.91494 2.487179 Male
## 4 alaska 2 169.5660 9.907407 163.1296 28.89286 2.103448 Female
## 5 arizona 1 191.3739 5.163793 177.1724 27.63152 2.814286 Male
## 6 arizona 2 156.2054 6.142857 162.7043 26.67683 2.026667 Femalestates.sex.map <- left_join(states, states.sex.stats, by = c("region" = "state.name"))
head(states.sex.map)
## long lat group order region subregion SEX avg.wt
## 1 -87.46201 30.38968 1 1 alabama <NA> 1 198.8936
## 2 -87.46201 30.38968 1 1 alabama <NA> 2 173.0315
## 3 -87.48493 30.37249 1 2 alabama <NA> 1 198.8936
## 4 -87.48493 30.37249 1 2 alabama <NA> 2 173.0315
## 5 -87.52503 30.37249 1 3 alabama <NA> 1 198.8936
## 6 -87.52503 30.37249 1 3 alabama <NA> 2 173.0315
## avg.qlrest2 avg.ht avg.bmi avg.drnk sex
## 1 8.648936 177.5729 28.50714 3.033333 Male
## 2 9.224771 163.9956 29.21280 2.041667 Female
## 3 8.648936 177.5729 28.50714 3.033333 Male
## 4 9.224771 163.9956 29.21280 2.041667 Female
## 5 8.648936 177.5729 28.50714 3.033333 Male
## 6 9.224771 163.9956 29.21280 2.041667 FemaleAverage # of alcoholic drinks per day by state and gender
qplot(long, lat, geom = "polygon", data = states.sex.map,
group = group, fill = avg.drnk) + coord_map() +
facet_grid(sex ~ .)left_join to combine child healthcare data with maps information.states.health.stats <- read.csv("https://bit.ly/2hRBMq0")qplot to create a map of child healthcare undercoverage rate by statelibrary(maps)
library(dplyr)
states <- map_data("state")
states.health.map <- left_join(states, states.health.stats,
by = c("region" = "state.name"))
# Use qplot to create a map of child healthcare undercoverage
# rate by state
qplot(data = states.health.map, x = long, y = lat,
geom = 'polygon', group = group,
fill = no.coverage) + coord_map()Use ggplot2 options to clean up your map!
+ ggtitle(...)+ theme_bw()+ theme(...)+ scale_fill_gradient2(...)+ coord_map()qplot(long, lat, geom = "polygon", data = states.map,
group = group, fill = avg.drnk) +
coord_map() + theme_bw() +
scale_fill_gradient2(
name = "Avg Drinks",
limits = c(1.5, 3.5),
low = "lightgray", high = "red") +
theme(axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
ggtitle("Average Number of Alcoholic Beverages
Consumed Per Day by State")Use options to polish the look of your map of child healthcare undercoverage rate by state!
qplot(data = states.health.map, x = long, y = lat,
geom = 'polygon', group = group, fill = no.coverage) +
coord_map() +
scale_fill_gradient2(
name = "Child\nHealthcare\nUndercoverage",
limits = c(0, .2),
low = 'white', high = 'red') +
ggtitle("Health Insurance in the U.S.\n
Which states have the highest rates
of undercovered children?") +
theme_minimal() +
theme(panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) NOAA Data: - National Oceanic and Administration - Temperature and Salinity Data in the Gulf of Mexico - Measured using Floats, Gliders and Boats
US Fisheries and Wildlife Data:
Both data sets have geographic coordinates for every observation
NOAA data is a .rdata file so we need to read it specially:
getwd() command to find your current working directoryload("noaa.rdata")Take a peek at the top of the floats NOAA data:
head(floats, n = 2)[,1:5]
## callSign Date_Time JulianDay Time_QC Latitude
## 1 Q4901043 7/12/2010 2455390 1 24.823
## 2 Q4901043 7/12/2010 2455390 1 24.823
head(floats, n = 2)[,6:10]
## Longitude Position_QC Depth Depth_QC Temperature
## 1 -87.964 1 2 1 29.83
## 2 -87.964 1 4 1 29.65
head(floats, n = 2)[,11:14]
## Temperature_QC Salinity Salinity_QC Type
## 1 1 36.59 1 Float
## 2 1 36.58 1 Floatqplot(Longitude, Latitude, color = callSign, data = floats) +
coord_map()qplot(Longitude, Latitude, color = callSign, data = gliders) +
coord_map()qplot(Longitude, Latitude, color = callSign, data = boats) +
coord_map()This data has the same context - a common time and common place
ggplot2ggplot() +
geom_path(data = states, aes(x = long, y = lat, group = group)) +
geom_point(data = floats, aes(x = Longitude, y = Latitude, color = callSign)) +
geom_point(aes(x, y), shape = "x", size = 5, data = rig) +
geom_text(aes(x, y), label = "BP Oil Rig",
size = 5, data = rig, hjust = -0.1) +
xlim(c(-91, -80)) + ylim(c(22,32)) + coord_map()To do this we need to understand a little more about the underlying theory…
Data: floats, states
Mappings:
|
Scales:
|
Geoms: Points (floats), lines (states)
Facetting: None
qplot vs ggplotqplot() stands for “quickplot”:
ggplot() stands for “grammar of graphics plot”
qplot vs ggplotTwo ways to construct the same plot for float locations:
qplot(Longitude, Latitude, color = callSign, data = floats) Or:
ggplot(data = floats,
aes(x = Longitude, y = Latitude, color = callSign)) +
geom_point() +
scale_x_continuous() +
scale_y_continuous() +
scale_color_discrete()Even ggplot will automatically pick default scales:
ggplot(data = floats,
aes(x = Longitude, y = Latitude, color = callSign)) +
geom_point()Find the ggplot() statement that creates this plot:
Hint: look at the Floats data for variable ideas
ggplot(aes(x = Depth, y = Temperature, color = callSign),
data = floats) +
geom_point()A layer added ggplot() can be a geom…
… or a position adjustment to the scales
| Plot | Geom | Stat |
|---|---|---|
| Scatterplot | point | identity |
| Histogram | bar | bin count |
| Smoother | line + ribbon | smoother function |
| Binned Scatterplot | rectange + color | 2d bin count |
More geoms described at http://docs.ggplot2.org/current/
Build a map using NOAA data
ggplot() +
geom_path(data = states, aes(x = long, y = lat, group = group)) +
geom_point(data = floats, aes(x = Longitude, y = Latitude, color = callSign)) +
geom_point(aes(x, y), shape = "x", size = 5, data = rig) +
geom_text(aes(x, y), label = "BP Oil Rig", size = 5, data = rig, hjust = -0.1) +
xlim(c(-91, -80)) +
ylim(c(22, 32)) + coord_map()animal <- read.csv("https://bit.ly/2hNlTUl")library(lubridate)
animal$month <- month(as.Date(animal$Date_))ggplot() +
geom_path(data = states, aes(x = long, y = lat, group = group)) +
geom_point(data = animal, aes(x = Longitude, y = Latitude)) +
xlim(c(-91, -80)) + ylim(c(24,32)) + coord_map()ggplot() +
geom_path(data = states, aes(x = long, y = lat, group = group)) +
geom_point(data = animal, aes(x = Longitude, y = Latitude,
color = class)) +
xlim(c(-91, -80)) + ylim(c(24,32)) + coord_map()ggplot() +
geom_path(data = states, aes(x = long, y = lat, group = group)) +
geom_point(data = animal, aes(x = Longitude, y = Latitude,
color = Condition)) +
xlim(c(-91, -80)) + ylim(c(24,32)) + coord_map()ggplot() +
geom_path(data = states, aes(x = long, y = lat, group = group)) +
geom_point(data = animal, aes(x = Longitude, y = Latitude,
color = Condition), alpha = .5) +
xlim(c(-91, -80)) + ylim(c(24,32)) +
facet_wrap(~month) + coord_map()